A Framework for Authorship Identification in the Internet Environment

نویسندگان

  • Jan Rygl
  • Ales Horák
چکیده

Misuse of anonymous online communication for illegal purposes has become a major concern [2,12]. In this paper, we present a framework named ART (Authorship Recognition Tool), that is designed to minimize manual procedures and maximize the efficiency of authorship identification based on the content of Internet electronic documents. The framework covers the phases of document retrieval and database document management. ART provides implementations of efficient authorship identification algorithm and authorship similarity algorithm including the possibility to obtain extra data for learning and tests. The framework also determines whether or not different author’s identities are interlinked. The authorship is analysed by machine learning and natural language processing methods. Technical information such as IP address is considered only as an optional attribute for the machine learning because it can be easily forged or devalued if the author communicates from public places or through proxy servers. The decision which algorithm to use for determining the authorship of an anonymous document depends on the documents’ language.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification in Cyberspace as a Main Challenge of e-Government (A Legal Approach to e-Identity Management System)

The penetration and growth of the Internet, as a key technology of the 21st century, caused to some major changes in all individual and social aspects of human life. Some of these changes, which associated with the evolution and changes in the concepts, including legal and political ones, have become a challenge. Identity as a legal and Government as a political concepts are in those fields whi...

متن کامل

ارزیابی تطبیقی کارایی ساختار فراداده نظام‌های شناسگر دیجیتالی

The main solution to the problems of persistency and uniqueness in identification of digital objects in a web environment is provided by using digital identifiers instead of URL. The main basis of this solution is resolution mechanism that is used in digital identifier systems. Resolution is the use of indirect names instead of URLs; what worked for the DNS (Domain Name System) in stabilizing i...

متن کامل

Feeling may separate Two Authors: Incorporating Sentiment in Authorship Identification Task

The modern era has been now extremely advanced and well developed by use of the internet especially blog, social networks, online forum and email etc. are gaining immense popularity. Thus, authorship identification is being used not only in such areas but also for forensic analysis and humanities. In this paper, we have proposed a framework for authorship identification by including the sentime...

متن کامل

یک سیستم نوین هوشمند تشخیص هویت نویسنده فارسی زبان بر اساس سبک نوشتاری - مقاله برگزیده هفدهمین کنفرانس ملی انجمن کامپیوتر ایران

The rapid development of communication by the Internet and the misuse of the anonymity embedded in the nature of online written documents have led to serious security issues. Anonymous identity of the Internet tools such as emails, blogs, and Web sites have made them target methods of interest for criminal activities. On the other hand, world social and political relations have made a great int...

متن کامل

A Process for Developing the Statement of Internet Research Ethics based on Action Research Method

Background: Research ethics in cyberspace or Internet research ethics (IRE) is a subset of applied ethics that aims to study, introduce, and apply ethical codes for guiding research activities in cyberspace. The compilation of the ethical statement is based on two methods of documentary research and action research. The action research process is implemented in four stages: 1) diagnosis, 2) act...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011